feat(data-sanitization): pattern groups, strict matching, cookieAndFormEncodedMatcher#321
Conversation
…rmEncodedMatcher (#321) - Add PatternEntry type (string | { match: string; strict?: boolean }) for exact vs. substring field-name control - Add credentialPatterns, headerPatterns, piiPatterns, phiPatterns constants; export all from index - defaultPatterns now includes headerPatterns (authorization, api-key) in addition to credentialPatterns - Rename formEncodedMatcher → cookieAndFormEncodedMatcher; stops at both & and ; so it handles URL form-encoded and HTTP Cookie headers in one matcher - Add strict param to all three matchers; cookie matcher uses a negative lookbehind (?<![\w-]) to reject substring matches in strict mode - normalizeEntry helper in replacers.ts handles PatternEntry in all contexts - ignorePatterns filter updated to match against normalizeEntry(p).match - objectReplacer key matchers use ^pattern$ for strict, [\w-]*pattern[\w-]* for non-strict - Add test/constants.test.ts; expand matchers.test.ts with cookie-style, strict mode, and renamed cookieAndFormEncodedMatcher tests (262 tests total) - Update README and CLAUDE.md to reflect renamed matcher, pattern groups, PatternEntry type, and piiPatterns/phiPatterns usage examples Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…edMatcher - Rename 'Form-encoded matcher and multiline strings' section to 'Cookie and form-encoded matcher and multiline strings' - Update value stop-char description to mention & and ; both - Remove outdated claim that string removal is 10-20% slower than masking; benchmarks show cost is comparable - Widen cold-start ratio to 15-32x range to reflect hardware variance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR refactors the data-sanitization pattern system to support granular categorization and strict exact-matching for field names. It introduces ChangesPattern categorization and strict field-name matching
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Coverage Report for packages/data-sanitization-log-providers
File CoverageNo changed files found. |
…Replacer Add two tests for object-form customPatterns entries: - strict: true → exact key match only (covers the ^pattern$ branch) - strict omitted → substring match (covers the strict ?? false branch) Restores 100% branch coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Coverage Report for packages/data-sanitization
File Coverage
|
||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
packages/data-sanitization/README.md (1)
255-277:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the PII/PHI example output for
health_card.The example masks
health_card, but that key is not inpiiPatterns/phiPatternsshown in this PR, so this output is likely incorrect. Either add a matching pattern in the example config or update the expected output.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/data-sanitization/README.md` around lines 255 - 277, The README example is inconsistent: the patient object includes health_card but neither piiPatterns nor phiPatterns (used in sanitizeData with customPatterns) define a pattern for the "health_card" key, so the expected masked output is wrong; update the example by either adding a matching pattern for "health_card" to the patterns array referenced (piiPatterns or phiPatterns) or change the expected output to show health_card unmasked, and ensure you reference the same symbols (patient, sanitizeData, customPatterns, piiPatterns, phiPatterns, health_card) so the example and config stay in sync.packages/data-sanitization/test/matchers.test.ts (2)
268-279:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winTighten the semicolon delimiter test to assert the intended behavior.
Line 277 only checks
toContain(';'), which passes whether;is treated as a delimiter or consumed as value content. This test currently won’t catch regressions in delimiter handling.Proposed test assertion update
- it('should not treat a semicolon as a field delimiter', () => { + it('should treat a semicolon as a field delimiter', () => { // Arrange const matcher = cookieAndFormEncodedMatcher('password'); const testData = 'password=secret;username=mark'; // Act const allMatches = [...testData.matchAll(matcher)]; - // Assert — semicolons are not in the stop-character set; value captures past the semicolon + // Assert expect(allMatches.length).toBe(1); - expect(allMatches[0]?.[0]).toContain(';'); + expect(allMatches[0]?.[0]).toBe('password=secret;'); + expect(allMatches[0]?.[1]).toBe('password='); + expect(allMatches[0]?.[2]).toBe(';'); });As per coding guidelines
**/*.test.{js,ts,jsx,tsx}: Unit tests should follow best practices for test organization, assertions, and coverage.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/data-sanitization/test/matchers.test.ts` around lines 268 - 279, The test for cookieAndFormEncodedMatcher is too weak — replace the loose toContain(';') assertion with precise checks that the regex actually captures the semicolon as part of the value: assert the full match equals 'password=secret;username=mark' (allMatches[0]?.[0]) and/or assert the value capture group equals 'secret;username=mark' (allMatches[0]?.[1]) so the test fails if the semicolon becomes a delimiter.
230-240:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReplace permissive assertions with deterministic expectations.
Line 239 (
>= 1) is overly permissive, and Line 897 (>= 0) is always true. These assertions don’t reliably validate matcher behavior.Proposed assertion tightening
it('should match a field with an empty value', () => { @@ - // Assert — document current behavior when the value is empty - expect(allMatches.length).toBeGreaterThanOrEqual(1); + // Assert + expect(allMatches.length).toBe(1); + expect(allMatches[0]?.[0]).toBe('password=&'); }); @@ it('should handle values that contain escaped backslashes', () => { @@ - // Assert — the \\\\ in the value may interact with the regex stop pattern; document current behavior - expect(allMatches.length).toBeGreaterThanOrEqual(0); + // Assert + expect(allMatches.length).toBe(1); + expect(allMatches[0]?.[0]).toBe('\\"password\\":\\"sec\\\\ret\\"'); });As per coding guidelines
**/*.test.{js,ts,jsx,tsx}: Unit tests should follow best practices for test organization, assertions, and coverage.Also applies to: 887-898
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/data-sanitization/test/matchers.test.ts` around lines 230 - 240, The test using cookieAndFormEncodedMatcher('password') is asserting a permissive expect(allMatches.length).toBeGreaterThanOrEqual(1) which is non-deterministic; change it to assert exact, deterministic behavior by checking that allMatches.length equals the expected count (e.g., 1) and verify the matched result content (e.g., the captured name is "password" and the captured value is the empty string) using the iterator result from testData.matchAll(matcher). Replace the >= assertions in this test and the similar assertions around lines 887–898 with precise expectations for length and captured groups so the matcher behavior is unambiguous.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/data-sanitization/src/index.ts`:
- Around line 14-20: The root module index.ts currently re-exports
credentialPatterns, defaultPatterns, headerPatterns, phiPatterns, and
piiPatterns which expands the public API; revert index.ts to export only the
sanitizeData function and remove those named exports from this file, and instead
expose them from a dedicated submodule (e.g., a new patterns or constants
entrypoint) so consumers who need
credentialPatterns/defaultPatterns/headerPatterns/phiPatterns/piiPatterns import
them from that subpath rather than from the package root; update any internal
imports to reference the new submodule and ensure sanitizeData remains the sole
export from packages/data-sanitization/src/index.ts.
In `@packages/data-sanitization/src/replacers.ts`:
- Around line 19-24: Add unit tests to cover the uncovered branches by
exercising normalizeEntry and objectReplacer: write tests that pass a
PatternEntry as an object to normalizeEntry (with and without the strict
property) and assert it returns {match, strict} with strict defaulting to false
when omitted; also add tests for objectReplacer that verify strict key matching
semantics by using patterns wrapped with ^...$ (should only match exact keys)
versus plain substring patterns (should match keys containing the substring).
Target the normalizeEntry function and the objectReplacer behavior to trigger
both branches reported as missing.
---
Outside diff comments:
In `@packages/data-sanitization/README.md`:
- Around line 255-277: The README example is inconsistent: the patient object
includes health_card but neither piiPatterns nor phiPatterns (used in
sanitizeData with customPatterns) define a pattern for the "health_card" key, so
the expected masked output is wrong; update the example by either adding a
matching pattern for "health_card" to the patterns array referenced (piiPatterns
or phiPatterns) or change the expected output to show health_card unmasked, and
ensure you reference the same symbols (patient, sanitizeData, customPatterns,
piiPatterns, phiPatterns, health_card) so the example and config stay in sync.
In `@packages/data-sanitization/test/matchers.test.ts`:
- Around line 268-279: The test for cookieAndFormEncodedMatcher is too weak —
replace the loose toContain(';') assertion with precise checks that the regex
actually captures the semicolon as part of the value: assert the full match
equals 'password=secret;username=mark' (allMatches[0]?.[0]) and/or assert the
value capture group equals 'secret;username=mark' (allMatches[0]?.[1]) so the
test fails if the semicolon becomes a delimiter.
- Around line 230-240: The test using cookieAndFormEncodedMatcher('password') is
asserting a permissive expect(allMatches.length).toBeGreaterThanOrEqual(1) which
is non-deterministic; change it to assert exact, deterministic behavior by
checking that allMatches.length equals the expected count (e.g., 1) and verify
the matched result content (e.g., the captured name is "password" and the
captured value is the empty string) using the iterator result from
testData.matchAll(matcher). Replace the >= assertions in this test and the
similar assertions around lines 887–898 with precise expectations for length and
captured groups so the matcher behavior is unambiguous.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: cdf1af13-a5df-48e1-bd0c-b6a89454072b
📒 Files selected for processing (10)
CLAUDE.mdpackages/data-sanitization/README.mdpackages/data-sanitization/docs/performance.mdpackages/data-sanitization/src/constants.tspackages/data-sanitization/src/index.tspackages/data-sanitization/src/matchers.tspackages/data-sanitization/src/replacers.tspackages/data-sanitization/src/types.tspackages/data-sanitization/test/constants.test.tspackages/data-sanitization/test/matchers.test.ts
Overview
Adds pattern groups, strict field-name matching, and unifies the cookie/form-encoded matcher into one.
Details
PatternEntrytype —customPatternsnow acceptsstring | { match: string; strict?: boolean }. Plain strings continue to use substring matching; objects withstrict: truematch only the exact field name.credentialPatterns,headerPatterns,piiPatterns, andphiPatterns. All four are exported from the package.defaultPatternsnow combines credentials + headers (addsauthorizationandapi-keyto the defaults).cookieAndFormEncodedMatcher— RenamesformEncodedMatcher; stops at both∧so it handles URL form-encoded strings and HTTP Cookie headers in one matcher. Strict mode uses a negative lookbehind(?<![\w-])to reject substring matches (e.g.tokendoes not match insidesession_token=abc).normalizeEntryhelper normalisesPatternEntryto{ match, strict }for use inbuildPatterns,buildStringScanRegexes, andobjectReplacerkey matching.ignorePatternsfilter updated to match against thematchstring of each entry, so it works correctly with object-form patterns.constants.test.tsadded;matchers.test.tsexpanded with cookie-style, strict mode, and renamed matcher tests (232 → 262 tests).PatternEntrytype in options table,piiPatterns/phiPatternsusage examples, and renamed matcher description.performance.mdupdated to reflect renamed matcher, revisedremoveMatchesstring characterisation, and hardware-range cold-start ratio. CLAUDE.md updated to reflect the renamed matcher and new constants structure.Related Tickets and/or Pull Requests
Relates to #320
Checklist
Breaking change:
formEncodedMatcheris renamed tocookieAndFormEncodedMatcher. Callers importing the named export directly will need to update the import name.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests